AITopics | particle gradient flow

This paper considers the problem of optimizing over measures instead of parameters directly ( as is standard in ML), for differentiable predictors with convex loss. This is an infinite dimensional convex optimization problem. The paper considers instead optimizing with m particles (dirac deltas). As m tends to infinity this corresponds to optimizing over the measure space. Proposition 2.3 shows existence and uniqueness of the particle gradient flow for a given initialization.

global convergence, over-parameterized model, particle gradient flow, (10 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Batch Bayesian Optimization via Particle Gradient Flows

Crovini, Enrico, Cotter, Simon L., Zygalakis, Konstantinos, Duncan, Andrew B.

arXiv.org Artificial IntelligenceJan-9-2023

Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this `inner' optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2209.04722

Country: Europe (0.46)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Chizat, Lénaïc, Bach, Francis

Neural Information Processing SystemsDec-31-2018

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

artificial intelligence, gradient flow, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > North Sea > Southern North Sea (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Chizat, Lénaïc, Bach, Francis

Neural Information Processing SystemsDec-31-2018

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

artificial intelligence, gradient flow, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Chizat, Lenaic, Bach, Francis

arXiv.org Machine LearningMay-24-2018

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

artificial intelligence, gradient flow, machine learning, (19 more...)

arXiv.org Machine Learning

1805.09545

Country: Europe > France (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Filters

Collaborating Authors

particle gradient flow

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Reviews: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Batch Bayesian Optimization via Particle Gradient Flows

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport